Overview
Brought to you by YData
Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 19768 |
| Missing cells | 12941 |
| Missing cells (%) | 3.4% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 8.1 MiB |
| Average record size in memory | 430.0 B |
Variable types
| Categorical | 2 |
|---|---|
| Boolean | 6 |
| Text | 1 |
| Numeric | 8 |
| DateTime | 2 |
site_admin has constant value "True" | Constant |
followers is highly overall correlated with following and 6 other fields | High correlation |
following is highly overall correlated with followers and 4 other fields | High correlation |
label is highly overall correlated with text_bot_count | High correlation |
log_followers is highly overall correlated with followers and 6 other fields | High correlation |
log_following is highly overall correlated with followers and 4 other fields | High correlation |
log_public_gists is highly overall correlated with followers and 4 other fields | High correlation |
log_public_repos is highly overall correlated with followers and 6 other fields | High correlation |
public_gists is highly overall correlated with followers and 4 other fields | High correlation |
public_repos is highly overall correlated with followers and 6 other fields | High correlation |
text_bot_count is highly overall correlated with label and 1 other fields | High correlation |
type is highly overall correlated with text_bot_count | High correlation |
label is highly imbalanced (67.2%) | Imbalance |
type is highly imbalanced (92.8%) | Imbalance |
text_bot_count is highly imbalanced (88.7%) | Imbalance |
bio has 10929 (55.3%) missing values | Missing |
followers has 816 (4.1%) missing values | Missing |
log_followers has 816 (4.1%) missing values | Missing |
public_repos has 942 (4.8%) zeros | Zeros |
public_gists has 7961 (40.3%) zeros | Zeros |
followers has 1445 (7.3%) zeros | Zeros |
following has 6017 (30.4%) zeros | Zeros |
log_public_repos has 942 (4.8%) zeros | Zeros |
log_public_gists has 7961 (40.3%) zeros | Zeros |
log_followers has 1445 (7.3%) zeros | Zeros |
log_following has 6017 (30.4%) zeros | Zeros |
Reproduction
| Analysis started | 2024-11-26 05:00:45.720662 |
|---|---|
| Analysis finished | 2024-11-26 05:01:03.883360 |
| Duration | 18.16 seconds |
| Software version | ydata-profiling vv4.12.0 |
| Download configuration | config.json |
Variables
label
Categorical
High correlation  Imbalance 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| Human | |
|---|---|
| Bot | 1190 |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.8796034 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Human |
|---|---|
| 2nd row | Human |
| 3rd row | Human |
| 4th row | Bot |
| 5th row | Human |
Common Values
| Value | Count | Frequency (%) |
| Human | 18578 | |
| Bot | 1190 | 6.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| human | 18578 | |
| bot | 1190 | 6.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| H | 18578 | |
| u | 18578 | |
| m | 18578 | |
| a | 18578 | |
| n | 18578 | |
| B | 1190 | 1.2% |
| o | 1190 | 1.2% |
| t | 1190 | 1.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 96460 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| H | 18578 | |
| u | 18578 | |
| m | 18578 | |
| a | 18578 | |
| n | 18578 | |
| B | 1190 | 1.2% |
| o | 1190 | 1.2% |
| t | 1190 | 1.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 96460 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| H | 18578 | |
| u | 18578 | |
| m | 18578 | |
| a | 18578 | |
| n | 18578 | |
| B | 1190 | 1.2% |
| o | 1190 | 1.2% |
| t | 1190 | 1.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 96460 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| H | 18578 | |
| u | 18578 | |
| m | 18578 | |
| a | 18578 | |
| n | 18578 | |
| B | 1190 | 1.2% |
| o | 1190 | 1.2% |
| t | 1190 | 1.2% |
type
Boolean
High correlation  Imbalance 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 19.4 KiB |
| True | |
|---|---|
| False | 171 |
| Value | Count | Frequency (%) |
| True | 19597 | |
| False | 171 | 0.9% |
site_admin
Boolean
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 19.4 KiB |
| True |
|---|
| Value | Count | Frequency (%) |
| True | 19768 |
| Value | Count | Frequency (%) |
| True | 10794 | |
| False | 8974 |
| Value | Count | Frequency (%) |
| False | 11256 | |
| True | 8512 |
| Value | Count | Frequency (%) |
| True | 12691 | |
| False | 7077 |
| Value | Count | Frequency (%) |
| False | 16470 | |
| True | 3298 | 16.7% |
bio
Text
Missing 
| Distinct | 8641 |
|---|---|
| Distinct (%) | 97.8% |
| Missing | 10929 |
| Missing (%) | 55.3% |
| Memory size | 1.6 MiB |
Length
| Max length | 160 |
|---|---|
| Median length | 116 |
| Mean length | 61.460459 |
| Min length | 1 |
Unique
| Unique | 8574 ? |
|---|---|
| Unique (%) | 97.0% |
Sample
| 1st row | I just press the buttons randomly, and the program evolves... |
|---|---|
| 2nd row | Time is unimportant, only life important. |
| 3rd row | Done studying. Need challenges. |
| 4th row | Administrator of MOONGIFT that is introducing open source software everyday to Japanese engineers since 2004. |
| 5th row | Senior Software Engineer at Google, working on Certificate Transparency and generalized transparency. |
| Value | Count | Frequency (%) |
| 3069 | 3.9% | |
| and | 2526 | 3.2% |
| engineer | 1583 | 2.0% |
| software | 1521 | 1.9% |
| of | 1488 | 1.9% |
| at | 1380 | 1.8% |
| developer | 1236 | 1.6% |
| the | 1086 | 1.4% |
| a | 1038 | 1.3% |
| i | 1033 | 1.3% |
| Other values (14754) | 62407 |
Most occurring characters
| Value | Count | Frequency (%) |
| 70014 | 12.9% | |
| e | 49589 | 9.1% |
| o | 32360 | 6.0% |
| n | 31402 | 5.8% |
| a | 31366 | 5.8% |
| t | 31195 | 5.7% |
| r | 31181 | 5.7% |
| i | 28526 | 5.3% |
| s | 19655 | 3.6% |
| l | 14767 | 2.7% |
| Other values (1736) | 203194 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 543249 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 70014 | 12.9% | |
| e | 49589 | 9.1% |
| o | 32360 | 6.0% |
| n | 31402 | 5.8% |
| a | 31366 | 5.8% |
| t | 31195 | 5.7% |
| r | 31181 | 5.7% |
| i | 28526 | 5.3% |
| s | 19655 | 3.6% |
| l | 14767 | 2.7% |
| Other values (1736) | 203194 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 543249 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 70014 | 12.9% | |
| e | 49589 | 9.1% |
| o | 32360 | 6.0% |
| n | 31402 | 5.8% |
| a | 31366 | 5.8% |
| t | 31195 | 5.7% |
| r | 31181 | 5.7% |
| i | 28526 | 5.3% |
| s | 19655 | 3.6% |
| l | 14767 | 2.7% |
| Other values (1736) | 203194 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 543249 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 70014 | 12.9% | |
| e | 49589 | 9.1% |
| o | 32360 | 6.0% |
| n | 31402 | 5.8% |
| a | 31366 | 5.8% |
| t | 31195 | 5.7% |
| r | 31181 | 5.7% |
| i | 28526 | 5.3% |
| s | 19655 | 3.6% |
| l | 14767 | 2.7% |
| Other values (1736) | 203194 |
public_repos
Real number (ℝ)
High correlation  Zeros 
| Distinct | 594 |
|---|---|
| Distinct (%) | 3.0% |
| Missing | 82 |
| Missing (%) | 0.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 65.856243 |
| Minimum | 0 |
|---|---|
| Maximum | 994 |
| Zeros | 942 |
| Zeros (%) | 4.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 154.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 11 |
| median | 34 |
| Q3 | 82 |
| 95-th percentile | 240 |
| Maximum | 994 |
| Range | 994 |
| Interquartile range (IQR) | 71 |
Descriptive statistics
| Standard deviation | 92.912014 |
|---|---|
| Coefficient of variation (CV) | 1.4108308 |
| Kurtosis | 16.929526 |
| Mean | 65.856243 |
| Median Absolute Deviation (MAD) | 28 |
| Skewness | 3.3968422 |
| Sum | 1296446 |
| Variance | 8632.6424 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 942 | 4.8% |
| 1 | 551 | 2.8% |
| 2 | 465 | 2.4% |
| 3 | 396 | 2.0% |
| 4 | 380 | 1.9% |
| 6 | 364 | 1.8% |
| 5 | 357 | 1.8% |
| 7 | 330 | 1.7% |
| 9 | 312 | 1.6% |
| 8 | 307 | 1.6% |
| Other values (584) | 15282 |
| Value | Count | Frequency (%) |
| 0 | 942 | |
| 1 | 551 | |
| 2 | 465 | |
| 3 | 396 | |
| 4 | 380 | |
| 5 | 357 | 1.8% |
| 6 | 364 | 1.8% |
| 7 | 330 | 1.7% |
| 8 | 307 | 1.6% |
| 9 | 312 | 1.6% |
| Value | Count | Frequency (%) |
| 994 | 1 | |
| 992 | 1 | |
| 985 | 1 | |
| 968 | 1 | |
| 949 | 1 | |
| 941 | 2 | |
| 929 | 1 | |
| 924 | 1 | |
| 915 | 1 | |
| 893 | 1 |
public_gists
Real number (ℝ)
High correlation  Zeros 
| Distinct | 335 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 24 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.080531 |
| Minimum | 0 |
|---|---|
| Maximum | 964 |
| Zeros | 7961 |
| Zeros (%) | 40.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 154.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2 |
| Q3 | 10 |
| 95-th percentile | 65 |
| Maximum | 964 |
| Range | 964 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 43.585263 |
|---|---|
| Coefficient of variation (CV) | 3.0954275 |
| Kurtosis | 128.63629 |
| Mean | 14.080531 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 9.1069883 |
| Sum | 278006 |
| Variance | 1899.6751 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 7961 | |
| 1 | 1873 | 9.5% |
| 2 | 1152 | 5.8% |
| 3 | 823 | 4.2% |
| 4 | 665 | 3.4% |
| 5 | 627 | 3.2% |
| 6 | 488 | 2.5% |
| 7 | 405 | 2.0% |
| 9 | 327 | 1.7% |
| 8 | 318 | 1.6% |
| Other values (325) | 5105 |
| Value | Count | Frequency (%) |
| 0 | 7961 | |
| 1 | 1873 | 9.5% |
| 2 | 1152 | 5.8% |
| 3 | 823 | 4.2% |
| 4 | 665 | 3.4% |
| 5 | 627 | 3.2% |
| 6 | 488 | 2.5% |
| 7 | 405 | 2.0% |
| 8 | 318 | 1.6% |
| 9 | 327 | 1.7% |
| Value | Count | Frequency (%) |
| 964 | 1 | |
| 958 | 1 | |
| 947 | 1 | |
| 905 | 1 | |
| 892 | 1 | |
| 878 | 1 | |
| 877 | 1 | |
| 876 | 1 | |
| 831 | 1 | |
| 791 | 1 |
followers
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 891 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 816 |
| Missing (%) | 4.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 95.517307 |
| Minimum | 0 |
|---|---|
| Maximum | 999 |
| Zeros | 1445 |
| Zeros (%) | 7.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 154.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 7 |
| median | 30 |
| Q3 | 104 |
| 95-th percentile | 450.45 |
| Maximum | 999 |
| Range | 999 |
| Interquartile range (IQR) | 97 |
Descriptive statistics
| Standard deviation | 161.27742 |
|---|---|
| Coefficient of variation (CV) | 1.6884628 |
| Kurtosis | 8.9225762 |
| Mean | 95.517307 |
| Median Absolute Deviation (MAD) | 28 |
| Skewness | 2.8536948 |
| Sum | 1810244 |
| Variance | 26010.407 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1445 | 7.3% |
| 1 | 803 | 4.1% |
| 2 | 623 | 3.2% |
| 3 | 515 | 2.6% |
| 4 | 450 | 2.3% |
| 5 | 415 | 2.1% |
| 6 | 396 | 2.0% |
| 7 | 347 | 1.8% |
| 8 | 338 | 1.7% |
| 9 | 311 | 1.6% |
| Other values (881) | 13309 | |
| (Missing) | 816 | 4.1% |
| Value | Count | Frequency (%) |
| 0 | 1445 | |
| 1 | 803 | |
| 2 | 623 | |
| 3 | 515 | 2.6% |
| 4 | 450 | 2.3% |
| 5 | 415 | 2.1% |
| 6 | 396 | 2.0% |
| 7 | 347 | 1.8% |
| 8 | 338 | 1.7% |
| 9 | 311 | 1.6% |
| Value | Count | Frequency (%) |
| 999 | 2 | |
| 997 | 2 | |
| 995 | 1 | < 0.1% |
| 993 | 3 | |
| 992 | 2 | |
| 989 | 1 | < 0.1% |
| 988 | 3 | |
| 987 | 2 | |
| 985 | 1 | < 0.1% |
| 984 | 2 |
following
Real number (ℝ)
High correlation  Zeros 
| Distinct | 536 |
|---|---|
| Distinct (%) | 2.7% |
| Missing | 84 |
| Missing (%) | 0.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.964641 |
| Minimum | 0 |
|---|---|
| Maximum | 997 |
| Zeros | 6017 |
| Zeros (%) | 30.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 154.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 4 |
| Q3 | 21.25 |
| 95-th percentile | 137.85 |
| Maximum | 997 |
| Range | 997 |
| Interquartile range (IQR) | 21.25 |
Descriptive statistics
| Standard deviation | 78.829215 |
|---|---|
| Coefficient of variation (CV) | 2.7215671 |
| Kurtosis | 45.341375 |
| Mean | 28.964641 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 5.9196502 |
| Sum | 570140 |
| Variance | 6214.0451 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 6017 | |
| 1 | 1734 | 8.8% |
| 2 | 1092 | 5.5% |
| 3 | 794 | 4.0% |
| 4 | 602 | 3.0% |
| 5 | 533 | 2.7% |
| 6 | 484 | 2.4% |
| 7 | 407 | 2.1% |
| 8 | 368 | 1.9% |
| 9 | 322 | 1.6% |
| Other values (526) | 7331 |
| Value | Count | Frequency (%) |
| 0 | 6017 | |
| 1 | 1734 | 8.8% |
| 2 | 1092 | 5.5% |
| 3 | 794 | 4.0% |
| 4 | 602 | 3.0% |
| 5 | 533 | 2.7% |
| 6 | 484 | 2.4% |
| 7 | 407 | 2.1% |
| 8 | 368 | 1.9% |
| 9 | 322 | 1.6% |
| Value | Count | Frequency (%) |
| 997 | 1 | |
| 993 | 1 | |
| 991 | 1 | |
| 980 | 1 | |
| 969 | 1 | |
| 961 | 1 | |
| 960 | 1 | |
| 928 | 1 | |
| 914 | 1 | |
| 905 | 1 |
created_at
Date
| Distinct | 19767 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 154.6 KiB |
| Minimum | 2008-01-27 07:09:47 |
|---|---|
| Maximum | 2021-12-20 05:29:41 |
updated_at
Date
| Distinct | 19633 |
|---|---|
| Distinct (%) | 99.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 154.6 KiB |
| Minimum | 2016-08-08 22:18:09 |
|---|---|
| Maximum | 2023-10-14 14:33:48 |
text_bot_count
Categorical
High correlation  Imbalance 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| 0.00% | |
|---|---|
| 100.00% | 425 |
| 200.00% | 251 |
| 300.00% | 75 |
| 400.00% | 9 |
Length
| Max length | 7 |
|---|---|
| Median length | 5 |
| Mean length | 5.0773978 |
| Min length | 5 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.00% |
|---|---|
| 2nd row | 0.00% |
| 3rd row | 0.00% |
| 4th row | 0.00% |
| 5th row | 0.00% |
Common Values
| Value | Count | Frequency (%) |
| 0.00% | 19003 | |
| 100.00% | 425 | 2.1% |
| 200.00% | 251 | 1.3% |
| 300.00% | 75 | 0.4% |
| 400.00% | 9 | < 0.1% |
| 500.00% | 5 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.00 | 19003 | |
| 100.00 | 425 | 2.1% |
| 200.00 | 251 | 1.3% |
| 300.00 | 75 | 0.4% |
| 400.00 | 9 | < 0.1% |
| 500.00 | 5 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 60069 | |
| . | 19768 | 19.7% |
| % | 19768 | 19.7% |
| 1 | 425 | 0.4% |
| 2 | 251 | 0.3% |
| 3 | 75 | 0.1% |
| 4 | 9 | < 0.1% |
| 5 | 5 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 100370 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 60069 | |
| . | 19768 | 19.7% |
| % | 19768 | 19.7% |
| 1 | 425 | 0.4% |
| 2 | 251 | 0.3% |
| 3 | 75 | 0.1% |
| 4 | 9 | < 0.1% |
| 5 | 5 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 100370 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 60069 | |
| . | 19768 | 19.7% |
| % | 19768 | 19.7% |
| 1 | 425 | 0.4% |
| 2 | 251 | 0.3% |
| 3 | 75 | 0.1% |
| 4 | 9 | < 0.1% |
| 5 | 5 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 100370 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 60069 | |
| . | 19768 | 19.7% |
| % | 19768 | 19.7% |
| 1 | 425 | 0.4% |
| 2 | 251 | 0.3% |
| 3 | 75 | 0.1% |
| 4 | 9 | < 0.1% |
| 5 | 5 | < 0.1% |
log_public_repos
Real number (ℝ)
High correlation  Zeros 
| Distinct | 594 |
|---|---|
| Distinct (%) | 3.0% |
| Missing | 82 |
| Missing (%) | 0.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.3752039 |
| Minimum | 0 |
|---|---|
| Maximum | 6.9027427 |
| Zeros | 942 |
| Zeros (%) | 4.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 154.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.69314718 |
| Q1 | 2.4849066 |
| median | 3.5553481 |
| Q3 | 4.4188406 |
| 95-th percentile | 5.4847969 |
| Maximum | 6.9027427 |
| Range | 6.9027427 |
| Interquartile range (IQR) | 1.933934 |
Descriptive statistics
| Standard deviation | 1.4546625 |
|---|---|
| Coefficient of variation (CV) | 0.43098507 |
| Kurtosis | -0.1869779 |
| Mean | 3.3752039 |
| Median Absolute Deviation (MAD) | 0.92198875 |
| Skewness | -0.49771795 |
| Sum | 66444.265 |
| Variance | 2.116043 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 942 | 4.8% |
| 0.6931471806 | 551 | 2.8% |
| 1.098612289 | 465 | 2.4% |
| 1.386294361 | 396 | 2.0% |
| 1.609437912 | 380 | 1.9% |
| 1.945910149 | 364 | 1.8% |
| 1.791759469 | 357 | 1.8% |
| 2.079441542 | 330 | 1.7% |
| 2.302585093 | 312 | 1.6% |
| 2.197224577 | 307 | 1.6% |
| Other values (584) | 15282 |
| Value | Count | Frequency (%) |
| 0 | 942 | |
| 0.6931471806 | 551 | |
| 1.098612289 | 465 | |
| 1.386294361 | 396 | |
| 1.609437912 | 380 | |
| 1.791759469 | 357 | 1.8% |
| 1.945910149 | 364 | 1.8% |
| 2.079441542 | 330 | 1.7% |
| 2.197224577 | 307 | 1.6% |
| 2.302585093 | 312 | 1.6% |
| Value | Count | Frequency (%) |
| 6.902742737 | 1 | |
| 6.900730664 | 1 | |
| 6.893656355 | 1 | |
| 6.876264612 | 1 | |
| 6.856461985 | 1 | |
| 6.848005275 | 2 | |
| 6.835184586 | 1 | |
| 6.829793738 | 1 | |
| 6.820016365 | 1 | |
| 6.795705775 | 1 |
log_public_gists
Real number (ℝ)
High correlation  Zeros 
| Distinct | 335 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 24 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.3586833 |
| Minimum | 0 |
|---|---|
| Maximum | 6.8721281 |
| Zeros | 7961 |
| Zeros (%) | 40.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 154.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1.0986123 |
| Q3 | 2.3978953 |
| 95-th percentile | 4.1896547 |
| Maximum | 6.8721281 |
| Range | 6.8721281 |
| Interquartile range (IQR) | 2.3978953 |
Descriptive statistics
| Standard deviation | 1.4757519 |
|---|---|
| Coefficient of variation (CV) | 1.0861633 |
| Kurtosis | -0.20125826 |
| Mean | 1.3586833 |
| Median Absolute Deviation (MAD) | 1.0986123 |
| Skewness | 0.85639656 |
| Sum | 26825.843 |
| Variance | 2.1778436 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 7961 | |
| 0.6931471806 | 1873 | 9.5% |
| 1.098612289 | 1152 | 5.8% |
| 1.386294361 | 823 | 4.2% |
| 1.609437912 | 665 | 3.4% |
| 1.791759469 | 627 | 3.2% |
| 1.945910149 | 488 | 2.5% |
| 2.079441542 | 405 | 2.0% |
| 2.302585093 | 327 | 1.7% |
| 2.197224577 | 318 | 1.6% |
| Other values (325) | 5105 |
| Value | Count | Frequency (%) |
| 0 | 7961 | |
| 0.6931471806 | 1873 | 9.5% |
| 1.098612289 | 1152 | 5.8% |
| 1.386294361 | 823 | 4.2% |
| 1.609437912 | 665 | 3.4% |
| 1.791759469 | 627 | 3.2% |
| 1.945910149 | 488 | 2.5% |
| 2.079441542 | 405 | 2.0% |
| 2.197224577 | 318 | 1.6% |
| 2.302585093 | 327 | 1.7% |
| Value | Count | Frequency (%) |
| 6.872128101 | 1 | |
| 6.865891075 | 1 | |
| 6.854354502 | 1 | |
| 6.809039306 | 1 | |
| 6.794586581 | 1 | |
| 6.778784898 | 1 | |
| 6.777646594 | 1 | |
| 6.776506992 | 1 | |
| 6.723832441 | 1 | |
| 6.674561392 | 1 |
log_followers
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 891 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 816 |
| Missing (%) | 4.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.317997 |
| Minimum | 0 |
|---|---|
| Maximum | 6.9077553 |
| Zeros | 1445 |
| Zeros (%) | 7.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 154.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2.0794415 |
| median | 3.4339872 |
| Q3 | 4.6539604 |
| 95-th percentile | 6.112464 |
| Maximum | 6.9077553 |
| Range | 6.9077553 |
| Interquartile range (IQR) | 2.5745188 |
Descriptive statistics
| Standard deviation | 1.7718678 |
|---|---|
| Coefficient of variation (CV) | 0.5340173 |
| Kurtosis | -0.76283534 |
| Mean | 3.317997 |
| Median Absolute Deviation (MAD) | 1.3022112 |
| Skewness | -0.17730555 |
| Sum | 62882.678 |
| Variance | 3.1395154 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1445 | 7.3% |
| 0.6931471806 | 803 | 4.1% |
| 1.098612289 | 623 | 3.2% |
| 1.386294361 | 515 | 2.6% |
| 1.609437912 | 450 | 2.3% |
| 1.791759469 | 415 | 2.1% |
| 1.945910149 | 396 | 2.0% |
| 2.079441542 | 347 | 1.8% |
| 2.197224577 | 338 | 1.7% |
| 2.302585093 | 311 | 1.6% |
| Other values (881) | 13309 | |
| (Missing) | 816 | 4.1% |
| Value | Count | Frequency (%) |
| 0 | 1445 | |
| 0.6931471806 | 803 | |
| 1.098612289 | 623 | |
| 1.386294361 | 515 | 2.6% |
| 1.609437912 | 450 | 2.3% |
| 1.791759469 | 415 | 2.1% |
| 1.945910149 | 396 | 2.0% |
| 2.079441542 | 347 | 1.8% |
| 2.197224577 | 338 | 1.7% |
| 2.302585093 | 311 | 1.6% |
| Value | Count | Frequency (%) |
| 6.907755279 | 2 | |
| 6.905753276 | 2 | |
| 6.903747258 | 1 | < 0.1% |
| 6.901737207 | 3 | |
| 6.900730664 | 2 | |
| 6.897704943 | 1 | < 0.1% |
| 6.896694332 | 3 | |
| 6.895682698 | 2 | |
| 6.893656355 | 1 | < 0.1% |
| 6.892641641 | 2 |
log_following
Real number (ℝ)
High correlation  Zeros 
| Distinct | 536 |
|---|---|
| Distinct (%) | 2.7% |
| Missing | 84 |
| Missing (%) | 0.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.8333041 |
| Minimum | 0 |
|---|---|
| Maximum | 6.9057533 |
| Zeros | 6017 |
| Zeros (%) | 30.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 154.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1.6094379 |
| Q3 | 3.1021554 |
| 95-th percentile | 4.9333909 |
| Maximum | 6.9057533 |
| Range | 6.9057533 |
| Interquartile range (IQR) | 3.1021554 |
Descriptive statistics
| Standard deviation | 1.7011791 |
|---|---|
| Coefficient of variation (CV) | 0.92793067 |
| Kurtosis | -0.65998577 |
| Mean | 1.8333041 |
| Median Absolute Deviation (MAD) | 1.6094379 |
| Skewness | 0.58379499 |
| Sum | 36086.758 |
| Variance | 2.8940103 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 6017 | |
| 0.6931471806 | 1734 | 8.8% |
| 1.098612289 | 1092 | 5.5% |
| 1.386294361 | 794 | 4.0% |
| 1.609437912 | 602 | 3.0% |
| 1.791759469 | 533 | 2.7% |
| 1.945910149 | 484 | 2.4% |
| 2.079441542 | 407 | 2.1% |
| 2.197224577 | 368 | 1.9% |
| 2.302585093 | 322 | 1.6% |
| Other values (526) | 7331 |
| Value | Count | Frequency (%) |
| 0 | 6017 | |
| 0.6931471806 | 1734 | 8.8% |
| 1.098612289 | 1092 | 5.5% |
| 1.386294361 | 794 | 4.0% |
| 1.609437912 | 602 | 3.0% |
| 1.791759469 | 533 | 2.7% |
| 1.945910149 | 484 | 2.4% |
| 2.079441542 | 407 | 2.1% |
| 2.197224577 | 368 | 1.9% |
| 2.302585093 | 322 | 1.6% |
| Value | Count | Frequency (%) |
| 6.905753276 | 1 | |
| 6.901737207 | 1 | |
| 6.899723107 | 1 | |
| 6.88857246 | 1 | |
| 6.877296071 | 1 | |
| 6.869014451 | 1 | |
| 6.867974409 | 1 | |
| 6.834108739 | 1 | |
| 6.818924065 | 1 | |
| 6.809039306 | 1 |
Interactions
Correlations
| blog | company | followers | following | hireable | label | location | log_followers | log_following | log_public_gists | log_public_repos | public_gists | public_repos | text_bot_count | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| blog | 1.000 | 0.258 | 0.318 | 0.193 | 0.218 | 0.024 | 0.369 | 0.414 | 0.361 | 0.369 | 0.372 | 0.126 | 0.264 | 0.062 | 0.080 |
| company | 0.258 | 1.000 | 0.159 | 0.070 | 0.057 | 0.070 | 0.392 | 0.259 | 0.205 | 0.191 | 0.205 | 0.046 | 0.121 | 0.069 | 0.102 |
| followers | 0.318 | 0.159 | 1.000 | 0.539 | 0.156 | 0.075 | 0.234 | 1.000 | 0.539 | 0.581 | 0.642 | 0.581 | 0.642 | 0.028 | 0.052 |
| following | 0.193 | 0.070 | 0.539 | 1.000 | 0.171 | 0.041 | 0.136 | 0.539 | 1.000 | 0.436 | 0.534 | 0.436 | 0.534 | 0.020 | 0.015 |
| hireable | 0.218 | 0.057 | 0.156 | 0.171 | 1.000 | 0.058 | 0.178 | 0.215 | 0.269 | 0.205 | 0.231 | 0.047 | 0.151 | 0.049 | 0.040 |
| label | 0.024 | 0.070 | 0.075 | 0.041 | 0.058 | 1.000 | 0.130 | 0.189 | 0.198 | 0.152 | 0.416 | 0.013 | 0.061 | 0.579 | 0.368 |
| location | 0.369 | 0.392 | 0.234 | 0.136 | 0.178 | 0.130 | 1.000 | 0.397 | 0.369 | 0.309 | 0.364 | 0.073 | 0.203 | 0.131 | 0.124 |
| log_followers | 0.414 | 0.259 | 1.000 | 0.539 | 0.215 | 0.189 | 0.397 | 1.000 | 0.539 | 0.581 | 0.642 | 0.581 | 0.642 | 0.101 | 0.331 |
| log_following | 0.361 | 0.205 | 0.539 | 1.000 | 0.269 | 0.198 | 0.369 | 0.539 | 1.000 | 0.436 | 0.534 | 0.436 | 0.534 | 0.083 | 0.139 |
| log_public_gists | 0.369 | 0.191 | 0.581 | 0.436 | 0.205 | 0.152 | 0.309 | 0.581 | 0.436 | 1.000 | 0.636 | 1.000 | 0.636 | 0.068 | 0.112 |
| log_public_repos | 0.372 | 0.205 | 0.642 | 0.534 | 0.231 | 0.416 | 0.364 | 0.642 | 0.534 | 0.636 | 1.000 | 0.636 | 1.000 | 0.203 | 0.417 |
| public_gists | 0.126 | 0.046 | 0.581 | 0.436 | 0.047 | 0.013 | 0.073 | 0.581 | 0.436 | 1.000 | 0.636 | 1.000 | 0.636 | 0.000 | 0.000 |
| public_repos | 0.264 | 0.121 | 0.642 | 0.534 | 0.151 | 0.061 | 0.203 | 0.642 | 0.534 | 0.636 | 1.000 | 0.636 | 1.000 | 0.022 | 0.041 |
| text_bot_count | 0.062 | 0.069 | 0.028 | 0.020 | 0.049 | 0.579 | 0.131 | 0.101 | 0.083 | 0.068 | 0.203 | 0.000 | 0.022 | 1.000 | 0.510 |
| type | 0.080 | 0.102 | 0.052 | 0.015 | 0.040 | 0.368 | 0.124 | 0.331 | 0.139 | 0.112 | 0.417 | 0.000 | 0.041 | 0.510 | 1.000 |
Missing values
Sample
| label | type | site_admin | company | blog | location | hireable | bio | public_repos | public_gists | followers | following | created_at | updated_at | text_bot_count | log_public_repos | log_public_gists | log_followers | log_following | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Human | True | True | False | False | False | False | NaN | 26.0 | 1.0 | 5.0 | 1.0 | 2011-09-26 17:27:03 | 2023-10-13 11:21:10 | 0.00% | 3.295837 | 0.693147 | 1.791759 | 0.693147 |
| 1 | Human | True | True | False | True | False | True | I just press the buttons randomly, and the program evolves... | 30.0 | 3.0 | 9.0 | 6.0 | 2015-06-29 10:12:46 | 2023-10-07 06:26:14 | 0.00% | 3.433987 | 1.386294 | 2.302585 | 1.945910 |
| 2 | Human | True | True | True | True | True | True | Time is unimportant,\nonly life important. | 103.0 | 49.0 | NaN | 221.0 | 2008-08-29 16:20:03 | 2023-10-02 02:11:21 | 0.00% | 4.644391 | 3.912023 | NaN | 5.402677 |
| 3 | Bot | True | True | False | False | True | False | NaN | 49.0 | 0.0 | 84.0 | 2.0 | 2014-05-20 18:43:09 | 2023-10-12 12:54:59 | 0.00% | 3.912023 | 0.000000 | 4.442651 | 1.098612 |
| 4 | Human | True | True | False | False | False | True | NaN | 11.0 | 1.0 | 6.0 | 2.0 | 2012-08-16 14:19:13 | 2023-10-06 11:58:41 | 0.00% | 2.484907 | 0.693147 | 1.945910 | 1.098612 |
| 5 | Human | True | True | True | True | True | False | Done studying. Need challenges. | 56.0 | 1.0 | 22.0 | 7.0 | 2017-04-11 14:08:07 | 2023-10-11 05:59:26 | 0.00% | 4.043051 | 0.693147 | 3.135494 | 2.079442 |
| 6 | Human | True | True | True | True | True | True | Administrator of MOONGIFT that is introducing open source software everyday to Japanese engineers since 2004. | 277.0 | NaN | 63.0 | 16.0 | 2008-04-07 22:22:22 | 2023-09-27 09:04:56 | 0.00% | 5.627621 | NaN | 4.158883 | 2.833213 |
| 7 | Human | True | True | True | False | True | False | Senior Software Engineer at Google, working on Certificate Transparency and generalized transparency. | 37.0 | 1.0 | 22.0 | 0.0 | 2012-01-19 21:57:07 | 2023-08-07 16:06:34 | 0.00% | 3.637586 | 0.693147 | 3.135494 | 0.000000 |
| 8 | Human | True | True | False | False | False | False | NaN | 27.0 | 2.0 | 37.0 | 596.0 | 2019-12-24 20:04:33 | 2023-10-12 11:55:01 | 0.00% | 3.332205 | 1.098612 | 3.637586 | 6.391917 |
| 9 | Human | True | True | True | True | True | False | Hi | 42.0 | 9.0 | 14.0 | 2.0 | 2013-07-23 23:29:34 | 2023-10-09 20:47:05 | 0.00% | 3.761200 | 2.302585 | 2.708050 | 1.098612 |
| label | type | site_admin | company | blog | location | hireable | bio | public_repos | public_gists | followers | following | created_at | updated_at | text_bot_count | log_public_repos | log_public_gists | log_followers | log_following | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19758 | Human | True | True | True | False | True | False | NaN | 30.0 | 0.0 | 10.0 | 11.0 | 2016-09-10 09:45:00 | 2023-10-06 11:30:51 | 0.00% | 3.433987 | 0.000000 | 2.397895 | 2.484907 |
| 19759 | Human | True | True | False | False | True | True | NaN | 37.0 | 19.0 | 91.0 | 6.0 | 2012-04-19 03:27:14 | 2023-10-07 18:13:52 | 0.00% | 3.637586 | 2.995732 | 4.521789 | 1.945910 |
| 19760 | Bot | True | True | False | False | False | False | I am the bot account of @alvaroaleman | 1.0 | 0.0 | 0.0 | 0.0 | 2018-12-15 19:55:31 | 2021-07-27 14:14:25 | 200.00% | 0.693147 | 0.000000 | 0.000000 | 0.000000 |
| 19761 | Human | True | True | False | False | False | False | NaN | 3.0 | 0.0 | 1.0 | 0.0 | 2013-11-10 16:05:37 | 2023-08-31 14:26:08 | 200.00% | 1.386294 | 0.000000 | 0.693147 | 0.000000 |
| 19762 | Human | True | True | False | False | False | False | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 2020-10-01 18:30:32 | 2020-12-29 19:45:12 | 0.00% | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 19763 | Bot | True | True | True | True | True | False | Tony came to Linux in 1994 and has never looked back. His entire professional career has been spent working with or on Linux. First as a systems administrator | 36.0 | 16.0 | 11.0 | 4.0 | 2014-07-02 23:27:34 | 2023-08-15 16:38:34 | 0.00% | 3.610918 | 2.833213 | 2.484907 | 1.609438 |
| 19764 | Human | True | True | False | False | False | False | NaN | 16.0 | 0.0 | 3.0 | 0.0 | 2017-12-06 21:56:31 | 2023-07-26 18:32:25 | 0.00% | 2.833213 | 0.000000 | 1.386294 | 0.000000 |
| 19765 | Human | True | True | True | False | True | False | Software engineer at RealTracs. | 13.0 | 0.0 | 10.0 | 1.0 | 2015-11-14 14:44:05 | 2022-08-23 21:09:49 | 0.00% | 2.639057 | 0.000000 | 2.397895 | 0.693147 |
| 19766 | Human | True | True | True | False | False | False | NaN | 7.0 | 0.0 | 2.0 | 0.0 | 2021-11-23 18:55:29 | 2023-10-06 22:50:45 | 0.00% | 2.079442 | 0.000000 | 1.098612 | 0.000000 |
| 19767 | Bot | True | True | False | False | True | False | NaN | 10.0 | 0.0 | 1.0 | 0.0 | 2016-04-22 22:11:59 | 2022-07-07 19:48:21 | 0.00% | 2.397895 | 0.000000 | 0.693147 | 0.000000 |